Product Code Database
Example Keywords: nokia -robots $90
   » » Wiki: Smooth Maximum
Tag Wiki 'Smooth Maximum'.
Tag

In , a smooth maximum of an x1, ...,  x n of numbers is a smooth approximation to the function \max(x_1,\ldots,x_n), meaning a parametric family of functions m_\alpha(x_1,\ldots,x_n) such that for every , the function is smooth, and the family converges to the maximum function as . The concept of smooth minimum is similarly defined. In many cases, a single family approximates both: maximum as the parameter goes to positive infinity, minimum as the parameter goes to negative infinity; in symbols, as and as . The term can also be used loosely for a specific smooth function that behaves similarly to a maximum, without necessarily being part of a parametrized family.


Examples

Boltzmann operator
For large positive values of the parameter \alpha > 0, the following formulation is a smooth, differentiable approximation of the maximum function. For negative values of the parameter that are large in absolute value, it approximates the minimum.

\mathcal{S}_\alpha has the following properties:

  1. \mathcal{S}_\alpha\to \max as \alpha\to\infty
  2. \mathcal{S}_0 is the of its inputs
  3. \mathcal{S}_\alpha\to \min as \alpha\to -\infty

The gradient of \mathcal{S}_{\alpha} is closely related to and is given by

\nabla_{x_i}\mathcal{S}_\alpha (x_1,\ldots,x_n) = \frac{e^{\alpha x_i}}{\sum_{j=1}^n e^{\alpha x_j}} 1.

This makes the softmax function useful for optimization techniques that use .

This operator is sometimes called the Boltzmann operator, after the Boltzmann distribution.


LogSumExp
Another smooth maximum is :

\mathrm{LSE}_\alpha(x_1, \ldots, x_n) = \frac{1}{\alpha} \log \sum_{i=1}^n \exp \alpha x_i

This can also be normalized if the x_i are all non-negative, yielding a function with domain [0,\infty)^n and range [0, \infty):

g(x_1, \ldots, x_n) = \log \left( \sum_{i=1}^n \exp x_i - (n-1) \right)

The (n - 1) term corrects for the fact that \exp(0) = 1 by canceling out all but one zero exponential, and \log 1 = 0 if all x_i are zero.


Mellowmax
The mellowmax operator is defined as follows:
\mathrm{mm}_\alpha(x) = \frac{1}{\alpha} \log \frac{1}{n} \sum_{i=1}^n \exp \alpha x_i
It is a non-expansive operator. As \alpha \to \infty, it acts like a maximum. As \alpha \to 0, it acts like an arithmetic mean. As \alpha \to -\infty, it acts like a minimum. This operator can be viewed as a particular instantiation of the quasi-arithmetic mean. It can also be derived from information theoretical principles as a way of regularizing policies with a cost function defined by KL divergence. The operator has previously been utilized in other areas, such as power engineering.


Connection between LogSumExp and Mellowmax
LogSumExp and Mellowmax are the same function differing by a constant \frac{\log {n}}{\alpha}. LogSumExp is always larger than the true max, differing at most from the true max by \frac{\log {n}}{\alpha} in the case where all n arguments are equal and being exactly equal to the true max when all but one argument is -\infty. Similarly, Mellowmax is always less than the true max, differing at most from the true max by \frac{\log {n}}{\alpha} in the case where all but one argument is -\infty and being exactly equal to the true max when all n arguments are equal.


p-Norm
Another smooth maximum is the :

\| (x_1, \ldots, x_n) \|_p = \left( \sum_{i=1}^n |x_i|^p \right)^\frac{1}{p}

which converges to \| (x_1, \ldots, x_n) \|_\infty = \max_{1\leq i\leq n} |x_i| as p \to \infty.

An advantage of the p-norm is that it is a norm. As such it is (homogeneous): \| (\lambda x_1, \ldots, \lambda x_n) \|_p = |\lambda| \cdot \| (x_1, \ldots, x_n) \|_p , and it satisfies the triangle inequality.


Smooth maximum unit
The following binary operator is called the Smooth Maximum Unit (SMU):
\begin{align} \textstyle\max_\varepsilon(a, b) &= \frac{a + b + |a - b|_\varepsilon}{2} \\ &= \frac{a + b + \sqrt{(a - b)^2 + \varepsilon}}{2} \end{align} where \varepsilon \geq 0 is a parameter. As \varepsilon \to 0, |\cdot|_\varepsilon \to |\cdot| and thus \textstyle\max_\varepsilon \to \max.


See also

https://www.johndcook.com/soft_maximum.pdf

M. Lange, D. Zühlke, O. Holz, and T. Villmann, "Applications of lp-norms and their smooth approximations for gradient based learning vector quantization," in Proc. ESANN, Apr. 2014, pp. 271-276. (https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2014-153.pdf)

Page 1 of 1
1
Page 1 of 1
1

Account

Social:
Pages:  ..   .. 
Items:  .. 

Navigation

General: Atom Feed Atom Feed  .. 
Help:  ..   .. 
Category:  ..   .. 
Media:  ..   .. 
Posts:  ..   ..   .. 

Statistics

Page:  .. 
Summary:  .. 
1 Tags
10/10 Page Rank
5 Page Refs
1s Time